sort:Optimize sort collation for long lines#12144
sort:Optimize sort collation for long lines#12144mattsu2020 wants to merge 2 commits intouutils:mainfrom
Conversation
|
GNU testsuite comparison: |
Merging this PR will degrade performance by 23.24%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | sort_key_field[500000] |
767.8 ms | 804.6 ms | -4.57% |
| ❌ | Memory | sort_german_de_locale |
3.3 MB | 4.3 MB | -23.24% |
Comparing mattsu2020:fix_sort_performance (8c85e7f) with main (485b156)
Footnotes
-
46 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
|
Out of interest, why choose 1 MiB as the limit, rather than something lower like |
Since measurements using 64 KiB showed performance that was at least equivalent for the issue workload, we will change the threshold to u16::MAX. |
|
@mattsu2020 Could you also add a benchmark (in separate PR)? |
Sure, I’ll keep this PR focused on the fix and open a separate PR adding a benchmark for long-line locale collation. |
23e4bb3 to
8c85e7f
Compare
What changed
Why
Fixes #12138. In UTF-8 locales,
sortprecomputed ICU collation keys for every input line. For inputs with a small number of very large lines, such as 26 lines of 200 MiB each, the cost of generating and storing multi-GiB collation keys dominated runtime.Impact
Small and normal-sized lines keep the existing precomputed-key fast path. Very long lines skip the expensive key materialization and use
locale_cmpwhen compared.Validation
cargo check -p uu_sortcargo test -p uu_sortcargo test -p coreutils --test tests test_sort::test_default_unsorted_ints -- --exactcmpfor 52 MiB and 130 MiB reproducer inputs.LC_ALL=en_US.UTF-8 --parallel 1 --buffer-size 8G: